6 research outputs found

    INFORMATION INTEGRATION APPROACHES FOR INVESTIGATING ESTROGEN RECEPTOR MEDIATED TRANSCRIPTION

    Get PDF
    Estrogen plays essential roles in the function of normal physiology and diseases. Its effects are mainly mediated through two intracellular estrogen receptors, ERα and ERβ, which belong to a family of nuclear receptors (NRs) functioning as transcription regulators. In the first part of this thesis, we aim to derive a holistic view of the transcription machineries at estrogen-responsive genes and further, to reveal different mechanisms of estrogen-mediated transcription regulation. In order to achieve this, we integrated and systematically dissected a variety of genome-wide high-throughput datasets, including gene expression arrays, ChIP-seq, GRO-seq, and ChIA-PET. Our analyses have led to the following novel findings: In the absence of the ligand, most of the estrogen-responsive genes assumed a high-order chromatin configuration that involved Pol II, ERα and ERα-pioneer factors. Without the ligand, estrogen-induced genes showed active transcription at promoters but failed to elongate into gene bodies, and such a pause was lifted after estrogen treatment. However, the estrogen-repressed genes showed coordinated transcription at promoters and gene bodies in the absence and presence of estrogen. Through information integration, we inferred that, for estrogen-repressed genes, the majority of the high-order chromatin complexes containing actively transcribed genes were disrupted after estrogen treatment. The analyses led to the hypothesis that one mechanism for estrogen-mediated repression is through disrupting the original transcription-favoring chromatin structures. Further, nuclear receptors such as ERs interact with co-regulators to regulate gene transcription. Understanding the mechanism of action of co-regulator proteins—which do not bind DNA directly, but exert their effects by binding to transcription factors—is important for the study of normal physiology as well as diseased conditions. However, due to the nature of detecting indirect protein-DNA interaction, ChIP-seq signals from co-regulators can be relatively weak and thus biologically meaningful interactions remain difficult to identify. In the second part of this thesis, we investigated and compared different machine learning approaches to integrate multiple types of genomic and transcriptomic information derived from our experiments and from public databases. This helped us to overcome the difficulty of identifying functional DNA binding sites of the co-regulator SRC-1 in the context of estrogen response. Our results indicate that supervised learning with the naïve Bayes algorithm significantly enhanced the peak calling of weak ChIP-seq signals and outperformed other machine learning algorithms. Our integrative approach revealed many potential ERα/SRC-1 DNA binding sites that would otherwise be missed by conventional peak calling algorithms with default settings

    N-gram analysis of 970 microbial organisms reveals presence of biological language models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>It has been suggested previously that genome and proteome sequences show characteristics typical of natural-language texts such as "signature-style" word usage indicative of authors or topics, and that the algorithms originally developed for natural language processing may therefore be applied to genome sequences to draw biologically relevant conclusions. Following this approach of 'biological language modeling', statistical n-gram analysis has been applied for comparative analysis of whole proteome sequences of 44 organisms. It has been shown that a few particular amino acid n-grams are found in abundance in one organism but occurring very rarely in other organisms, thereby serving as genome signatures. At that time proteomes of only 44 organisms were available, thereby limiting the generalization of this hypothesis. Today nearly 1,000 genome sequences and corresponding translated sequences are available, making it feasible to test the existence of biological language models over the evolutionary tree.</p> <p>Results</p> <p>We studied whole proteome sequences of 970 microbial organisms using n-gram frequencies and cross-perplexity employing the Biological Language Modeling Toolkit and Patternix Revelio toolkit. Genus-specific signatures were observed even in a simple unigram distribution. By taking statistical n-gram model of one organism as reference and computing cross-perplexity of all other microbial proteomes with it, cross-perplexity was found to be predictive of branch distance of the phylogenetic tree. For example, a 4-gram model from proteome of <it>Shigellae flexneri 2a</it>, which belongs to the <it>Gammaproteobacteria </it>class showed a self-perplexity of 15.34 while the cross-perplexity of other organisms was in the range of 15.59 to 29.5 and was proportional to their branching distance in the evolutionary tree from <it>S. flexneri</it>. The organisms of this genus, which happen to be pathotypes of <it>E.coli</it>, also have the closest perplexity values with <it>E. coli.</it></p> <p>Conclusion</p> <p>Whole proteome sequences of microbial organisms have been shown to contain particular n-gram sequences in abundance in one organism but occurring very rarely in other organisms, thereby serving as proteome signatures. Further it has also been shown that perplexity, a statistical measure of similarity of n-gram composition, can be used to predict evolutionary distance within a genus in the phylogenetic tree.</p

    Isolated BAP1 Genomic Alteration in Malignant Pleural Mesothelioma Predicts Distinct Immunogenicity with Implications for Immunotherapeutic Response

    No full text
    Malignant pleural mesothelioma (MPM), an aggressive cancer of the mesothelial cells lining the pleural cavity, lacks effective treatments. Multiple somatic mutations and copy number losses in tumor suppressor genes (TSGs) BAP1, CDKN2A/B, and NF2 are frequently associated with MPM. The impact of single versus multiple genomic alterations of TSG on MPM biology, the immune tumor microenvironment, clinical outcomes, and treatment responses are unknown. Tumors with genomic alterations in BAP1 alone were associated with a longer overall patient survival rate compared to tumors with CDKN2A/B and/or NF2 alterations with or without BAP1 and formed a distinct immunogenic subtype with altered transcription factor and pathway activity patterns. CDKN2A/B genomic alterations consistently contributed to an adverse clinical outcome. Since the genomic alterations of only BAP1 was associated with the PD-1 therapy response signature and higher LAG3 and VISTA gene expression, it might be a candidate marker for immune checkpoint blockade therapy. Our results on the impact of TSG genotypes on MPM and the correlations between TSG alterations and molecular pathways provide a foundation for developing individualized MPM therapies

    Isolated <i>BAP1</i> Genomic Alteration in Malignant Pleural Mesothelioma Predicts Distinct Immunogenicity with Implications for Immunotherapeutic Response

    No full text
    Malignant pleural mesothelioma (MPM), an aggressive cancer of the mesothelial cells lining the pleural cavity, lacks effective treatments. Multiple somatic mutations and copy number losses in tumor suppressor genes (TSGs) BAP1, CDKN2A/B, and NF2 are frequently associated with MPM. The impact of single versus multiple genomic alterations of TSG on MPM biology, the immune tumor microenvironment, clinical outcomes, and treatment responses are unknown. Tumors with genomic alterations in BAP1 alone were associated with a longer overall patient survival rate compared to tumors with CDKN2A/B and/or NF2 alterations with or without BAP1 and formed a distinct immunogenic subtype with altered transcription factor and pathway activity patterns. CDKN2A/B genomic alterations consistently contributed to an adverse clinical outcome. Since the genomic alterations of only BAP1 was associated with the PD-1 therapy response signature and higher LAG3 and VISTA gene expression, it might be a candidate marker for immune checkpoint blockade therapy. Our results on the impact of TSG genotypes on MPM and the correlations between TSG alterations and molecular pathways provide a foundation for developing individualized MPM therapies
    corecore